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Abstract 

Recent advances in population genomics have made it possible to detect previ- 
ously unidentified structure, obtain more accurate estimates of demographic 
parameters, and explore adaptive divergence, potentially revolutionizing the way 
genetic data are used to manage wild populations. Here, we identified 10 944 sin- 
gle-nucleotide polymorphisms using restriction-site-associated DNA (RAD) 
sequencing to explore population structure, demography, and adaptive diver- 
gence in five populations of Chinook salmon (Oncorhynchus tshawytscha) from 
western Alaska. Patterns of population structure were similar to those of past 
studies, but our ability to assign individuals back to their region of origin was 
greatly improved (>90% accuracy for all populations). We also calculated effec- 
tive size with and without removing physically linked loci identified from a link- 
age map, a novel method for nonmodel organisms. Estimates of effective size 
were generally above 1000 and were biased downward when physically linked loci 
were not removed. Outlier tests based on genetic differentiation identified 733 
loci and three genomic regions under putative selection. These markers and 
genomic regions are excellent candidates for future research and can be used to 
create high-resolution panels for genetic monitoring and population assignment. 
This work demonstrates the utility of genomic data to inform conservation in 
highly exploited species with shallow population structure. 



Introduction 

Discrete management of genetically distinct populations 
can increase species-wide resilience and stabilize the pro- 
ductivity of ecosystems as a whole (Hilborn et al. 2003; 
Schindler et al. 2010). For over three decades, genetic data 
from 10 to 100 putatively neutral markers have been used 
to identify discrete populations, define conservation units, 
and estimate demographic parameters (Utter et al. 1974; 
Wirgin and Waldman 1994; Waples et al. 2008). The use of 
genetic data for management has been especially successful 
in Pacific salmon [Oncorhynchus spp.) which exhibit exten- 
sive population structure (Utter and Ryman 1993; Shaklee 
et al. 1999). However, applications have been limited for 
recently isolated populations of salmonids (Taylor et al. 



1997) or marine species with little neutral structure 
(Waples 1998). In these circumstances, data from thou- 
sands of markers (genomic data) may be necessary to 
resolve population structure and aid management. 

Genomic data can provide accurate estimates of neutral 
population structure (Avise 2010; Funk et al. 2012; Na- 
rum et al. 2013), identify genomic regions that display 
adaptive divergence (Allendorf et al. 2010; Angeloni et al. 
2012), and provide increased accuracy when estimating 
demographic parameters (Allendorf et al. 2010). Geno- 
types from thousands of loci have been used to elucidate 
neutral structure in populations of Pacific lamprey {Ento- 
sphenus tridentatus, Hess et al. 2013) and to improve res- 
olution of fine-scale structure in Atlantic salmon (Salmo 
salar, Bourret et al. 2013). Additionally, genome scans 
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have revealed adaptively important markers and genomic 
regions in sockeye salmon (Oncorhynchus nerka, Russello 
et al. 2012), Atlantic cod {Gadus morhua, Bradbury et al. 
2013; Hemmer-Hansen et al. 2013), and lake whitefish 
(Coregonus clupeaformis, Renaut et al. 2012). Although 
many studies have used genomic data to elucidate struc- 
ture in nonmodel organisms, demographic parameters 
such as effective size are rarely estimated with these types 
of data. 

Effective population size (N e ) is an important parameter 
in conservation biology (Frankham 2005), but methods to 
calculate N e with genomic data are lacking (Waples and Do 
2010). Specifically, many calculations of N e require knowl- 
edge of linkage relationships, which are often unknown for 
nonmodel organisms. A possible solution to this problem 
is the use of high-density linkage maps that can now be cre- 
ated rapidly for many species with genotyping by sequenc- 
ing (e.g., Baxter et al. 2011; Miller et al. 2012; Gagnaire 
et al. 2013). Using data from these maps, it is possible to 
obtain estimates of N e that are not biased by physical link- 
age. To the best of our knowledge, this method has only 
been implemented in populations of model organisms 
(Park 2011; Sved et al. 2013), but the increasing availability 
of linkage maps will facilitate N e estimation in many spe- 
cies of conservation concern. 

Chinook salmon (Oncorhynchus tshawytscha) from wes- 
tern Alaska represent an excellent system to explore the 
utility of genomics in a management context. Chinook sal- 
mon inhabit four major regions in western Alaska: Norton 
Sound, the Yukon River, the Kuskokwim River, and Bristol 
Bay, all of which vary significantly in size, hydrology, and 
climate (Fig. 1, Olsen et al. 2011). The Kuskokwim and 
Yukon regions are composed of a mainstem river with 
many tributaries, whereas Norton Sound is composed of 



many unconnected and short rivers (mean length 
-110 km; Olsen et al. 2011). The Bristol Bay region is com- 
posed of several river systems each with smaller tributaries 
(i.e., Nushagak, Togiak, Naknek rivers). Past studies using 
allozymes and single-nucleotide polymorphisms (SNPs) 
found evidence of structure in western Alaska, but con- 
cluded that differences among populations in Norton 
Sound, the lower portions of the Yukon and Kuskokwim 
rivers, and Bristol Bay, were insufficient to allocate mixture 
samples back to their region of origin (Gharrett et al. 1987; 
Templin et al. 2011). Returns of Chinook salmon to wes- 
tern Alaska over the past decade have been approximately 
20% lower than their long-term average, renewing interest 
in the migration patterns and vulnerability of stocks to 
fisheries in this region (ADF&G 2013). Improved resolu- 
tion of population structure would allow managers to 
investigate these questions using genetic tools. Addition- 
ally, estimates of N e and other demographic parameters 
could help to inform conservation and management efforts 
across the region. 

We used restriction-site-associated DNA (RAD) 
sequencing to investigate the population structure and 
demography of Chinook salmon from western Alaska. We 
identified over 10 000 SNPs in 270 individuals from five 
populations across western Alaska. Patterns of genetic vari- 
ation were assessed using both population- and individual- 
based methods and validated with assignment tests. We 
then aligned our RAD markers to a linkage map to calcu- 
late N e with and without removing physical linkage. We 
also conducted outlier tests and used the linkage map to 
detect loci and genomic regions under putative selection. 
This approach defines an important way that genomics can 
be used to inform management of nonmodel species with 
high gene flow. 
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Figure 1 Map of sampling locations. See Table 1 for additional details about each sampling site. 
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Materials and methods 

Tissue sampling 

Tissue samples from spawning Chinook salmon were 
available from four regions in coastal western Alaska and 
one in the upper Yukon River (Fig. 1, Table 1). We 
selected populations that did not have unusually small 
census sizes and that were genetically similar to proximate 
populations identified from previous studies (Olsen et al. 
2011; Templin et al. 2011); this approach ensured that 
our conclusions were based on populations that were rep- 
resentative of each region. Chinook salmon from the 
upper Yukon River are highly differentiated from those of 
western Alaska (Smith et al. 2005; Templin et al. 2011) 
and were included to anchor inferences of population 
structure. 

RAD sequencing, SNP discovery and genotyping 

Restriction-site-associated DNA libraries were prepared 
with the restriction enzyme Sbfl following the methods of 
Baird et al. (2008) and Everett et al. (2012) and sequenced 
on an Illumina HiSeq2000 at the University of Oregon 
Genomics Core Facility. We constructed 18 libraries for 
single-end sequencing (100 bp target length) containing 
12-24 individuals per library and one library for paired- 
end sequencing (100 x 2 bp target) containing eight indi- 
viduals to assemble longer sequence contigs for annotation. 
Pooled individuals were identified with unique 6-bp bar- 
codes. 

We used the Stacks software package, version 0.9999 
(Catchen et al. 2011) and methods similar to Hohenlohe 
et al. (2013) to discover and genotype SNPs from the 
sequenced RAD tags. Quality filtering of raw reads and 
demultiplexing based on barcode was conducted using 
process _radtags. Stacks of similar sequences were then 
assembled in each individual with ustacks, and a catalog of 
loci was created with cstacks. We included only the two 
individuals from each population with the greatest 
amount of sequence data when creating our catalog to 
reduce the detection of false polymorphisms. Including 
more individuals per population would have facilitated 
the detection of low-frequency SNPs, but would not have 



added additional SNPs to the final data set because these 
low-frequency SNPs were filtered out in downstream 
analyses. Finally, we used sstacks and populations to com- 
bine the genotypes from each individual into a single 
Genepop formatted file. 

SNP validation 

Putative SNPs discovered using Stacks were filtered to 
remove possible sequencing errors, paralogous sequence 
variants (PSVs), and uninformative polymorphisms. First, 
we removed any putative SNP that failed to genotype in 
>80% of individuals. We then removed those with a minor 
allele frequency <0.05 in all populations. These polymor- 
phisms are likely to be uninformative, are difficult to dis- 
tinguish from sequencing errors, can distort signals of 
selection and drift in natural populations, and may bias 
tests for selection (Roesti et al. 2012). We also discarded 
putative SNPs that were found at RAD tag positions 
>87 bp because these positions contained more polymor- 
phisms on average than the rest of the sequence (138 per 
bp for bp 1-87, 223 per bp for bp 88-93). This increase in 
putative SNPs per base pair is likely a result of sequencing 
errors as Illumina sequencing is more error prone toward 
the terminal positions of reads (Minoche et al. 2011). We 
kept only the putative SNP with the highest P ST from each 
RAD tag to reduce linkage in our data set. We also used the 
program PLINK version 1.07 (Purcell et al. 2007) to test 
for linkage disequilibrium between each pair of loci. If a 
locus pair had an r 2 value >0.8 in three of five populations, 
we removed the locus that was genotyped in the fewest 
individuals. 

Paralogous sequence variants, which are abundant in sal- 
monids as a result of an ancient whole-genome duplication 
event (Allendorf and Thorgaard 1984; Seeb et al. 2011), 
were removed from the data set when possible. PSVs are 
closely related sequences from different genomic locations 
that do not segregate as single loci and are therefore diffi- 
cult to genotype accurately (Gidskehaug et al. 2011). Hap- 
loid individuals can be used to identify PSVs because PSVs 
will appear heterozygous when all correctly segregating loci 
are homozygous (Hecht et al. 2013). To screen for PSVs in 
our data, we genotyped 50 haploid Chinook salmon from 



Table 1. Populations analyzed in this study with year sampled, sample size (W), observed heterozygosity (H 0 ), and expected heterozygosity (H E ). 



Sampling location 


Region 


Year 


GPS coordinates 


N 


H 0 


He 


Tubutulik River 


Norton Sound 


2009 


64.740, -161.888 


56 


0.248 


0.252 


Anvik River 


Lower Yukon R 


2007 


62.681, -160.214 


54 


0.260 


0.261 


Kogrukluk River 


Kuskokwim R 


2007 


60.841,-157.846 


57 


0.251 


0.258 


Koktuli River 


Bristol Bay 


2010 


59.93S, -156.427 


56 


0.256 


0.259 


Big Salmon River 


Upper Yukon R 


2007 


61.867, -134.917 


47 


0.232 


0.232 
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Washington, USA, at all putative SNPs discovered above, 
and loci with >10% heterozygosity were removed. 

We also conducted exact tests of Hardy-Weinberg equi- 
librium in Genepop version 4 (Rousset 2008) and removed 
loci that were out of equilibrium in three or more popula- 
tions (P < 0.05). We then removed individuals that were 
missing genotypes at > 15% of the SNPs. As a final filtra- 
tion step, we used ML-Relate (Kalinowski et al. 2006) to 
look for duplicated individuals in our data. 

Paired-end assembly and BLAST annotation 

We conducted a paired-end assembly with the PI and P2 
reads from each locus using Velvet (v. 1.1.06, Zerbino and 
Birney 2008) and the methods of Etter et al. (2011) and 
Everett et al. (2012) to increase query lengths for BLAST 
annotation. Consensus sequences were then aligned to the 
Swiss-Prot database using the BLASTX search algorithm. 
Alignments with E-values of <10~ 4 were retained. If multi- 
ple alignments had E-values of <10~ 4 for the same locus, 
the alignment with the lowest £-value was retained. 

Population structure and assignment tests 

Initial analysis of population structure was conducted with 
an individual-based principal component analysis (PCA) 
implemented in the R package adegenet (Jombart 2008). 
The significance of each principal component was assessed 
by randomly permuting the data 1000 times and compar- 
ing the observed eigenvalues to values generated by con- 
ducting PCA on the permuted data. PCA revealed five 
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Figure 2 Individual-based principal component analysis for all popula- 
tions and 1 0 944 SNPs. The five intermediate individuals from the Anvik 
River were removed from further analyses (see Methods). 



nonconforming individuals in the Anvik River collection 
that grouped between the Big Salmon River and Anvik 
River clusters (Fig. 2). These five individuals were removed 
from further analyses as they likely represent transient fish 
from middle or upper Yukon River populations. After 
removing these individuals, we calculated pairwise F ST val- 
ues (Weir and Cockerham 1984) for each population and 
performed significance tests for genetic differentiation in 
Arlequin 3.5 (Excoffier and Lischer 2010) using an exact 
test with 10 000 permutations. 

We conducted an analysis of molecular variance (amova) 
in Arlequin 3.5 to examine the variation within and among 
groups of genetically similar populations. The hierarchy for 
this analysis was chosen based on the clustering from the 
PCA: (i) Koktuli River, Kogrukluk River, and Anvik River, 
(ii) Tubutulik River, and (iii) Big Salmon River. Separate 
amovas were conducted for (i) the entire data set and (ii) 
all populations except the Big Salmon River. Finally, we 
calculated global and per-locus observed and expected het- 
erozygosities for each population in GenAlEx 6.5 (Peakall 
and Smouse 2012). 

We examined fine-scale structure in the closely related 
Anvik River, Kogrukluk River, and Koktuli River popula- 
tions with an individual PCA including only these three 
populations (see above for PCA methods). This analysis 
was conducted separately for the 10 944 RAD SNPs and 39 
of the 43 SNPs from Templin et al. (2011) that were devel- 
oped for Chinook salmon from expressed sequence tags. Of 
the four SNPs from Templin et al. (2011) that were not 
genotyped, two were removed because they were essentially 
monomorphic in other populations from western Alaska 
and two were removed because they were in linkage dis- 
equilibrium with another locus (Templin et al. 2011). 

Assignment power of four panels was evaluated with 
leave-one-out tests in GeneClass2.0 (Piry et al. 2004) to 
compare the influence of number of SNPs and relative 
divergence of SNPs on assignment accuracies. The four 
panels were (i) 39 SNPs from Templin et al. (2011), (ii) 39 
randomly chosen SNPs from the complete data set of 
10 944 RAD SNPs, (iii) the complete data set of 10 944 
RAD SNPs, and (iv) the full set of RAD SNPs with the 733 
outlier SNPs that were found to be under putative selection 
removed. We did not construct a panel with only the most 
divergent RAD SNPs because this approach would have led 
to an upward bias in the predicated accuracy of assignment 
for that panel (Anderson 2010). Leave-one-out tests were 
conducted by removing an individual from the baseline 
without replacement then assigning that individual back to 
a reference population using a Bayesian approach described 
in Rannala and Mountain (1997). Individuals were consid- 
ered to be assigned to a population if the assignment prob- 
ability to that population was higher than to any other 
population. 
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Alignment to linkage map 

We aligned our filtered loci to a linkage map for Chi- 
nook salmon consisting of 3534 RAD-derived SNPs dis- 
tributed across 34 linkage groups ranging in size from 
27.75 to 160.23 cM (Table SI, Everett and Seeb 2014). 
To conduct the alignments, we used BLASTN (Altschul 
et al. 1990) with the following parameters: minimum 
alignment length of 90 bp, 95% identity, and no more 
than two mismatching bases. If a single locus aligned to 
multiple map loci, we discarded all alignments for that 
locus. We used relatively strict alignment parameters for 
this analysis because sequence alignment in tetraploid- 
origin salmonids can provide ambiguous results when 
alignment parameters are not sufficiently strict (Everett 
et al. 2011; Seeb et al. 2011). 

Calculating N e and NJN 

Estimates of N e were performed with the linkage disequilib- 
rium method (Hill 1981; Waples 2006) updated for missing 
data following Peel et al. (2013). This method assumes all 
loci in the analysis are physically unlinked then utilizes the 
observed linkage disequilibrium to estimate N e . We 
removed comparisons between loci on the same linkage 
groups to obtain estimates that were unbiased by physical 
linkage (Park 2011; Sved et al. 2013). Additionally, we 
removed all loci that were putatively under selection as sug- 
gested by Waples (2006) (see below for description of tests 
for loci under selection). Calculations of N e were con- 
ducted using N e Estimator (Do et al. 2014) and R (R core 
development team 201 1). N e Estimator was used to calculate 
r 2 values for each locus pair with the following parameters: 
a minimum allele frequency cutoff of 0.02 and a random 
mating model. We then implemented the methods 
described in Waples (2006) and Peel et al. (2013) in R to 
obtain N e estimates and parametric 95% confidence inter- 
vals for each population (scripts available from W. Larson 
upon request). We calculated N e for three data sets: (i) all 
RAD SNPs that aligned to the linkage map, (ii) all RAD 
SNPs that aligned to the map with pairwise comparisons 
between markers on the same linkage group removed, and 
(iii) the 39 SNPs from Templin et al. (2011) that were in 
linkage equilibrium. 

We calculated the ratio of effective size to census size 
(NJN) using N e calculated with RAD-derived SNPs after 
removing physical linkage and estimates of total escape- 
ment obtained from aerial surveys (Koktuli River, Anvik 
River, Tubutulik River) and weir counts (Kogrukluk River, 
Big Salmon River). Multiple aerial surveys were used to 
estimate total run size for the Anvik River and Koktuli 
River populations, but only single aerial counts were avail- 
able for the Tubutulik River population. Single aerial 



counts from a river near the Tubutulik River collection 
were approximately four times smaller than those taken 
from a counting tower; therefore, we multiplied the aerial 
count from our collection by four. We averaged the last ten 
years of data to obtain an approximate value of census size 
for each population (only last 3 years used for Koktuli 
River due to data availability). 

Estimates of N e for Chinook salmon populations are 
complicated by the fact that multiple cohorts are repre- 
sented in each spawning group (Waples 1990). Single-sam- 
ple estimates of N e therefore do not precisely reflect the 
effective number of breeders per year or the effective num- 
ber of breeders per generation, but instead represent some 
intermediate value. We calculated two NJN ratios to 
bracket these possible scenarios: N e divided by the average 
census size (escapement) per year (NJN) and N e divided 
by the total census size per generation (NJNG). Values of 
G for each population were obtained from the sources in 
Table 5 by averaging age compositions across one to 
38 years of data depending on availability. 

Detection of loci under putative selection 

We identified putative loci under selection with Arlequin 
3.5. This program uses coalescent simulations to create a 
null distribution of F-statistics then generates P-values for 
each locus based on this distribution and observed het- 
erozygosities across loci (Excoffier et al. 2009). A hierar- 
chical island model was selected to reduce false positives 
introduced due to underlying population structure (Ex- 
coffier et al. 2009). The population hierarchy was the 
same as in the amova. Settings for the analysis were 
20 000 simulations, 10 simulated groups, and 100 demes 
per group. Loci that fell above the 95% quantile of the 
F ST distribution were considered candidates for direc- 
tional selection. 

Detection of candidate genomic regions under selection 

We used a linkage map in conjunction with a sliding win- 
dow analysis to identify highly divergent regions of the gen- 
ome that maybe under selection (c.fi, Bourret et al. 2013). 
This analysis was conducted with a sliding window 
approach that compares the mean pairwise F S t of a small 
(5 cM) genomic region to a null distribution created by 
bootstrapping over the complete data set (Hohenlohe et al. 
2010; Bourret et al. 2013). For each window, we sampled N 
F ST values with replacement from the entire data set where 
N was the number of SNPs in the window. This resampling 
routine was repeated 1000 times to generate a null distribu- 
tion. Windows with mean F ST values above the 95% quan- 
tile of the null distribution were candidates for directional 
selection. If a window mean was above 90% after 1000 rep- 
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licates, we increased the number of replicates to 5000 to 
improve accuracy in the tails of the null distribution. We 
chose a sliding window size of 5 cM and frame shift value 
of 1 cM. We also required at least two SNPs to be present 
in a window to conduct the above test. After testing multi- 
ple window sizes, we found that a 5-cM window provided 
sufficient resolution for detecting divergent regions without 
introducing excessive variance. This value was also used by 
Bourret et al. (2013) for linkage groups with similar num- 
bers of markers to ours. We conducted this analysis for all 
pairwise population comparisons. 

Results 

Sequencing, SNP discovery and filtration 

We obtained RAD data from 289 individuals, and the num- 
ber of sequences obtained for each individual ranged from 
1 622 400 to 8 707 337 with an average of 3 796 368 
(excluding low-quality individuals, see below). Alignments 
using Stacks revealed 42 351 putative SNPs. Removing 
putative SNPs that were genotyped in <80% of individuals 
eliminated more than half of these, leaving 20 296. After 
removing polymorphisms in bp 87-94 of each RAD tag, 
removing all but one putative SNP from each tag, and 
removing SNPs with minor allele frequency <0.05, 12 585 
SNPs remained. Screens for paralogous sequence variants 
revealed 845 loci that were potentially duplicated; these loci 
were eliminated. Significant deviations from Hardy-Wein- 
berg equilibrium were observed in 397 SNPs, and these loci 
were also removed. Significant linkage disequilibrium in 
three or more populations was found for 399 SNPs, and 
one SNP from each pair was removed. The final filtered 
data set consisted of 10 944 SNPs. We removed 17 individ- 
uals that were genotyped in <85% of SNPs, seven from the 
Kogrukluk River, four from the Anvik River, and six from 
the Big Salmon River (adjusted sample sizes in Table 1). 
Relatedness analysis revealed two pairs of duplicated indi- 
viduals (R > 0.9) from the Anvik River population, and 
one individual from each pair was removed. The final fil- 
tered data set consisted of 270 individuals genotyped at 
10 944 SNPs. Summary statistics for each locus are avail- 
able in Table SI, and histograms of overall and pairwise P ST 
for each locus are in Fig. SI. 

Paired-end assembly and BLAST annotation 

Paired-end assemblies produced 11 666 contigs with an 
average length of 268 bp (minimum 150 bp, maximum 
565 bp). BLAST annotation of these contigs yielded signifi- 
cant hits for 1576 (14%) of 10 944 SNPs (Table S2). Of 
these hits, over one-third aligned to transposable elements. 
Other common functional groups included DNA polyme- 
rases and transmembrane proteins. 



Population structure 

Principal component analysis revealed that the Big Salmon 
River and Tubutulik River populations formed completely 
separate clusters, while the Koktuli River, Kogrukluk River, 
and Anvik River populations essentially formed a single 
cluster (Fig. 2). The overall F ST of the full data set was 
0.041, and pairwise P S t values ranged from 0.003 for the 
Koktuli River-Kogrukluk River comparison to 0.098 for 
the Big Salmon River-Tubutulik River comparison 
(Table 2). Genetic differentiation between all population 
comparisons was highly significant (P < 0.001). The results 
of these significance tests should, however, be interpreted 
with extreme caution due to the large number of loci, 
which may overestimate precision. 

We conducted hierarchical amova for the entire data set 
and for a data set without the Big Salmon River population 
(Table 3). Both analyses displayed much larger variation 
among groups than within groups. Levels of observed het- 
erozygosity across populations ranged from 0.232 for the 
Big Salmon River to 0.260 for the Anvik River (Table 1). 

When the Koktuli River, Kogrukluk River, and Anvik 
River populations were analyzed separately with 10 944 
SNPs, all populations generally formed discrete clusters, 
but some overlap was present between the Koktuli River 
and Kogrukluk River populations (Fig. 3A). Additionally, 
populations from the Anvik River and Kogrukluk River 
each contained a subset of 10-20 individuals that fell out- 



Table 2. Pairwise f ST values calculated using 10 944 SNPs and number 
of genomic regions that were under putative selection (in parentheses). 
All pairwise comparisons are significantly differentiated (P < 0.01 ). 



Tubutulik 


Kogrukluk 


Koktuli 


River Anvik River 


River 


River 


Anvik River 0.030 (20) 






Kogrukluk River 0.027 (20) 0.005 (20) 






Koktuli River 0.028 (20) 0.006 (23) 


0.003 (20) 




Big Salmon River 0.098 (24) 0.075 (25) 


0.075 (21) 


0.077(25) 


Table 3. Results from two amovas with 1 0 944 SNPs. 






Percentage 


Source of variation 


d.f. 


of variation 


All populations 






Among groups 


2 


5.26 


Among populations within groups 


2 


0.45 


Within populations 


529 


94.32 


Big Salmon River excluded 






Among groups 


1 


2.41 


Among populations within groups 


2 


0.43 


Within populations 


436 


97.18 
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Figure 3 Individual-based principal component analysis for the Anvik 
River, Kogrukluk River, and Koktuli River populations using (A) 10 944 
RAD SNPs and (B) 39 SNPs from Templin et al. (201 1). 



side the main cluster. When PCA was conducted with the 
41 SNPs from Templin et al. (2011), no clustering pattern 
was apparent (Fig. 3B). 

The relatively small amount of variation (1-5%) 
explained by the first and second principal components 
(PCs) in our PCAs (Figs 2 and 3) can be attributed to 
the large number of axes used. Each PCA contained as 
many axes as individuals plotted, so PCA using all popu- 
lations contained 270 axes, and the PCA with three pop- 
ulations contained 163. PCs one and two in both PCAs 
each explained more than three times the variation of 
the average axis and explained significantly more varia- 



tion than would be expected if no real correlation 
existed (P < 0.001), but because of the large number of 
axes, the actual proportion of variation explained was 
small. 

Assignment accuracy was much higher using >10 000 
SNPs (> 89% assignment to correct population) com- 
pared to 39 SNPs (-50% assignment to correct popula- 
tion, Table 4). Panels containing close to the same 
number of SNPs generally performed similarly, but the 
39 SNPs from Templin et al. (2011) did perform slightly 
better than the 39 randomly chosen RAD SNPs, and the 
panel containing all 10 944 RAD SNPs performed slightly 
better than the panel with the 733 outlier SNPs removed 
(Table 4). 

Alignment to linkage map 

Of the 10 944 filtered loci, 1156 were successfully placed on 
the linkage map (33% of loci on the map successfully 
aligned to one of the 10 944 loci discovered in populations 
from western Alaska, see Table SI for map location of suc- 
cessful alignments). This proportion may seem small, but it 
is important to note that the map was constructed using a 
single Chinook salmon from Washington State. Chinook 
salmon from Washington State are substantially diverged 
from populations in western Alaska (Templin et al. 2011), 
therefore, it is likely that many RAD tags did not contain 
loci that were polymorphic in both the mapping cross and 
our study populations and were not useful for our analyses. 
Additionally, because only one individual was used for the 
mapping cross, our alignments were limited to the RAD 



Table 4. Results of leave-one-out tests for individual assignment with 
four SNP panels. Panels are: (1) 39 EST: 39 SNPs previously developed 
for Chinook salmon from expressed sequence tags (ESTs, Templin et al. 
201 1), (2) 39 RAD: 39 randomly chosen SNPs from the complete data 
set of 1 0 944 RAD SNPs, (3) 1 0 944 RAD: the complete data set of RAD 
SNPs, and (4) 10 21 1 RAD no outliers: the full set of RAD SNPs with the 
733 outlier SNPs that were found to be under putative selection 
removed. Individuals were considered to be correctly assigned if the 
assignment probability to population of origin was higher than to any 
other population. See Table S3 for assignment probabilities for each 
individual. 

% Correct assignment 

10 211 RAD 



Regions 


39 EST 


39 RAD 


1 0 944 RAD 


no outliers 


Tubutulik River 


67 


65 


100 


100 


Anvik River 


46 


30 


91 


89 


Kogrukluk River 


34 


30 


93 


93 


Koktuli River 


29 


30 


98 


95 


Big Salmon River 


96 


87 


100 


100 
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tags containing SNPs that segregated in the mapped indi- 
vidual. 

Demographic estimates 

Estimates of N e with the RAD-derived SNPs were highly 
variable across populations, ranging from close to 500 in 
the Anvik River to infinity for the Koktuli River (Table 5). 
These estimates were calculated using SNPs that were suc- 
cessfully aligned to the linkage map, providing over 
500 000 pairwise comparisons between loci. Pairwise com- 
parisons between SNPs located on the same linkage group 
represented about 20 000 of the 500 000 comparisons 
(6%). These 20 000 comparisons were removed to estimate 
N e between physically unlinked loci. Estimates of N e were 
consistently smaller for the data set that included all com- 
parisons (Table 5). This downward bias was not uniform, 
however, as estimates from Norton Sound appeared to be 
more affected by linkage than estimates for the other popu- 
lations. 

Estimates of N e with the 39 SNPs from Templin et al. 
(2011) ranged from 209 for the Anvik River to infinity for 
the Koktuli River, Kogrukluk River, and Tubutulik River 
populations. Confidence intervals for each estimate using 
39 SNPs included infinity and were larger than confidence 
intervals around estimates from the RAD-derived SNPs. 

Estimates of NJN and N e /NG were extremely variable, 
ranging from 0.17 and 0.03 for the Kogrukluk River popu- 
lation to 0.59 and 0.11 for the Tubutulik River population 
(Table 5). We did not calculate NJN or N e /NG for the 
Koktuli River or Big Salmon River populations because the 
confidence intervals around N e included infinity, suggest- 
ing our point estimates of N e may not be completely repre- 
sentative. 



Loci and genomic regions under putative selection 

Outlier tests in Arlequin revealed 733 loci (6.7%) that were 
significant outliers at the 5% level and 178 (1.6%) that were 
significant at the 1% level. BLAST annotation of the out- 
liers at the 5% level revealed 96 significant hits (13% 
success rate). Transposable elements represented over one- 
third of the significant hits which is consistent with the 
pattern from the complete data set. 



Significant comparisons 




Figure 4 Regions of the genome under putative selection as inferred 
by pairwise F ST across all population pairs. Each vertical line represents a 
linkage group, and the length of the line is proportional to the size of 
the linkage group in cM. Shaded areas indicate regions which are signif- 
icantly diverged in at least one population pair indicating putative selec- 
tion. The color of the shading corresponds to the number of significant 
pairwise population comparisons with red and purple indicating over 
half of the population pairs are divergent in the given region. 



Table 5. Estimates of effective population size (W e ) for five populations calculated with 1118 RAD-derived SNPs that were placed on the linkage map 
and 39 of the 43 SNPs that were in linkage equilibrium from Templin et al. (201 1 ). Estimates with RAD SNPs are calculated using only comparisons 
between loci on different linkage groups (W e linkage removed) and all comparisons (N e all data). The ratio of effective population size to census size 
(NJN) and effective population size to census size multiplied by generation length (NJNG) for each population is also reported (G is generation length 
and W is an approximate value of yearly escapement for each population, see methods). The W e used for these calculations is A/ e linkage removed (col- 
umn 2). We did not calculate NJN or NJNG for the Bristol Bay and upper Yukon populations because confidence intervals included infinity, suggest- 
ing our point estimates may not be completely representative. 

Population A/ e linkage removed W e all data W e 39 SNPs G W NJN NJHG Source of W Source of G 



Tubutulik River 


1909 


808 


Inf 


5.43 


3100 


0.62 


0.11 


Banducci et al. 


Lingnau 




(1295-3602) 


(674-1009) 


(174-lnf) 










(2007) 


(1996) 


Anvik River 


516 


505 


209 


5.48 


1700 


0.30 


0.06 


Howard et al. 


Sandone 




(451-604) 


(443-586) 


(65-lnf) 










(2009) 


(1995) 


Kogrukluk River 


2026 


1723 


Inf 


5.20 


12 000 


0.17 


0.03 


Williams and 


Howard et al. 




(1375-3825) 


(1233-2842) 


(134-lnf) 










Shelden (2011) 


(2009) 


Koktuli River 


Inf 


26 071 


Inf 


5.13 


6000 


N/A 


N/A 


Woody (2012) 


Howard et al. 




(6055-lnf) 


(3733-lnf) 


(Inf-lnf) 












(2009) 


Big Salmon 


13 101 


4243 


520 


5.65 


5000 


N/A 


N/A 


Mercer and 


Howard et al. 


River 


(1505-lnf) 


(1806-lnf) 


(70-lnf) 










Wilson (2011) 


(2009) 
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The number of genomic regions under putative selection 
for each population pair ranged from 20 to 25 and gener- 
ally increased when the Big Salmon River population was 
included (Fig. 4, Table 2). Overall, these regions appeared 
to be scattered randomly throughout the genome and were 
often significant in only one or two population compari- 
sons. Despite this pattern, three genomic regions on sepa- 
rate linkage groups (LG) were candidates for selection in 
more than half of the population comparisons (Fig. 4). 
These regions are LG2 at 70-78 cM, LG4 at 2-8 cM, and 
LG21 at 7-12 cM. 

Discussion 

We used RAD sequencing to characterize the genetic struc- 
ture, genomic divergence, and demography of five popula- 
tions of Chinook salmon from western Alaska. Patterns of 
genetic differentiation were similar to but more identifiable 
than in past studies (Gharrett et al. 1987; Templin et al. 
2011). Estimates of population N e ranged from 516 to 
infinity and appeared to be biased downward when loci 
that were physically linked were not removed. Regions of 
putative adaptive divergence appeared to be randomly dis- 
tributed across the genome with few shared areas of high 
divergence across populations, but we did find three geno- 
mic regions that displayed high divergence in multiple pop- 
ulations. Using genomic data, we were able to conduct 
individual assignment in populations where it was previ- 
ously unfeasible, discover genomic regions under putative 
selection, and estimate N e in populations with >1000 indi- 
viduals. Our approach therefore represents a significant 
improvement over previous studies employing fewer mark- 
ers and no linkage map. 

Population structure 

The largest genetic differentiation between populations in 
our data set existed between the Big Salmon River from 
the upper Yukon River and all other coastal populations. 
Chinook salmon from the upper Yukon River are thought 
to have genetically diverged from coastal populations after 
being isolated during the last glacial maximum (Olsen 
et al. 2011). Our results support this hypothesis and are 
consistent with those based on allozymes, microsatellites, 
and SNPs (Gharrett et al. 1987; Olsen et al. 2010; Templin 
et al. 2011). 

We also found high levels of divergence between the 
Tubutulik River in Norton Sound and all other popula- 
tions. This divergence was likely facilitated by the Nulato 
Hills, a small mountain range that separates the tributaries 
of Norton Sound from those of the Yukon River (Fig. 1), 
but could have also been influenced by environmental 
characteristics such as precipitation (Olsen et al. 2011). 



Populations from the lower Yukon, Kuskokwim, and 
Bristol Bay regions (Anvik River, Kogrukluk River, and 
Koktuli River) were least divergent, displaying pairwise P ST 
values < 0.01 for all population comparisons. The relatively 
small divergence we observed is consistent with other sal- 
monids in the region (Olsen et al. 2011; Garvin et al. 2013) 
and is somewhat expected given the surrounding environ- 
ment. Western Alaska is characterized by moisture-laden 
tundra and dynamic rivers that frequently change paths. 
When such stream captures occur, gene flow is facilitated 
between populations that were previously isolated. It is 
therefore likely that substantial historic and possibly 
continuing low-level gene flow has largely restricted genetic 
differentiation in this region (Seeb and Crane 1999). 

Nevertheless, we found genetic structure among the An- 
vik River, Kogrukluk River, and Koktuli River populations 
using both individual-based clustering methods and assign- 
ment tests. The Anvik River population displayed the high- 
est levels of divergence, forming a completely isolated 
cluster. This population may have diverged more quickly as 
a reflection of its relatively small estimated census and 
effective sizes (N = 1700, N e =516). The Kogrukluk River 
and Koktuli River populations, on the other hand, are at 
least four times larger than the Anvik River population and 
may not have been affected as substantially by genetic drift. 
Individuals that did not fall within major clusters were 
found in the Kogrukluk River and Anvik River popula- 
tions. These individuals may represent evidence of gene 
flow from genetically diverged upriver populations, but 
could have also resulted from within population variation. 
Individual-based PCA using 39 SNPs from Templin et al. 
(2011) did not resolve the population structure that was 
observed with the RAD data and displayed no apparent 
clustering pattern. These results emphasize the utility of 
genome-wide data when attempting to elucidate patterns 
of population differentiation. 

Assignment accuracies with both panels containing over 
10 000 SNPs were >89% for all populations, while assign- 
ment accuracies with the panels containing 39 SNPs were 
close to 50% on average per population. Additionally, the 
inclusion of outlier loci only slightly improved assignment 
accuracy. These results indicate that a large number of neu- 
tral SNPs were sufficient to achieve precise assignment and 
that, for our analysis, the number of SNPs used seemed to 
have more influence on assignment accuracies than the res- 
olution of those SNPs. Unfortunately, we were unable to 
evaluate the effectiveness of small panels of high- resolution 
SNPs compared to large panels of neutral SNPs because 
this type of analysis requires the use of a training and hold- 
out data set (Anderson 2010), which was not feasible with 
the sample sizes in our study. 

The patterns of population divergence observed here are 
consistent with previous studies, suggesting structuring of 
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Chinook salmon populations on regional scales (Templin 
et al. 2011). Despite this pattern, sampling additional pop- 
ulations from each region would likely improve estimates 
of population divergence and assignment accuracy. 

Demography 

Estimating and interpreting N e in salmon populations 
using single samples can be difficult because multiple 
cohorts are often present (Waples 1990). N e estimates 
therefore reflect a value somewhere between the effective 
number of breeders in a given year and the effective num- 
ber of breeders per generation. We divided N e by the cen- 
sus size (escapement) per year (N) and the census size per 
generation (NG) to account for both of these possibilities 
when comparing N e to census size. The NJN and NJNG 
ratios were highly variable across our populations, indicat- 
ing that effective and census size are not well correlated in 
our study system. 

A meta-analysis of 251 estimates of NJN found a median 
value of 0.14 and also showed that NJN ratios are generally 
larger in smaller populations (Palstra and Ruzzante 2008). 
Larger NJN ratios in smaller populations were also 
observed in our data. For example, the Anvik River had a 
census size of 1700 and NJN of 0.30, while the Kogrukluk 
River had a census size of 12 000 and an NJN of 0.17. This 
trend, however, was not consistent in the Tubutulik River 
population which had a census size of 3100 and NJN ratio 
of0.62. 

The large NJN ratio in the Tubutulik River population 
may have been due to gene flow from proximate popula- 
tions which can introduce additional genetic diversity and 
inflate estimates of N e (Palstra and Ruzzante 2011). The 
Tubutulik River is a small river in Norton Bay, which con- 
tains at least five additional salmon-producing rivers. Gene 
flow among subpopulations in this region may be quite 
common and could therefore have resulted in the larger 
than expected NJN estimates that we observed. Gene flow 
from proximate populations may also be inflating N e esti- 
mates from the Koktuli River and the Big Salmon River as 
both of these collections have census sizes close to 5000, 
but N e estimates with confidence intervals including infin- 
ity. It is important to note that estimates of census size are 
approximate and may not be completely representative. 
Nevertheless, our results suggest that census size is not an 
adequate predictor of effective size, especially in popula- 
tions that may belong to a larger metapopulation. 

Removing comparisons between loci on the same link- 
age group appeared to have a nonuniform effect on esti- 
mates of N e with larger estimates being more affected by 
removing linkage. For example, the estimate of N e for the 
Anvik River population, the smallest population in the 
study, only changed by 10 when linked comparisons were 



removed, whereas the estimate for the Big Salmon River 
changed by almost 9000. This relatively small bias for 
small populations was also found by Sved et al. (2013) 
and is expected given that in small populations, the signal 
of linkage disequilibrium due to genetic drift should be 
large compared to the signal due to physical linkage. 

It also appears that N e estimates for populations of simi- 
lar size can be affected nonuniformly by physically linked 
loci. Specifically, estimates of N e for the Tubutulik River 
displayed a larger downward bias when physically linked 
loci were included than estimates for the Kogrukluk River, 
even though the effective sizes for these populations were 
similar with unlinked loci. The nonuniform effects we 
observed when removing physically linked loci may be due 
to historic signals of N e that have been preserved due to 
linkage (Hill 1981; Tenesa et al. 2007). 

Estimating N e in large populations (N e > 1000) with 
10-100 genetic markers is extremely challenging due to the 
small amount of linkage disequilibrium caused by drift 
(Waples and Do 2010), but, with thousands of markers, 
accurately characterizing the signal of drift and estimating 
N e may be feasible (AUendorf et al. 2010). Estimates of N e 
from our study were infinite for three of five populations 
with 39 SNPs, but only infinite for one population with 
1118 SNPs. Additionally, all estimates with 39 SNPs, but 
only two estimates with 1118 SNPs, displayed confidence 
intervals including infinity, and confidence intervals were 
consistently smaller with 1118 SNPs. Our results indicate 
that genomic data can improve the accuracy of N e esti- 
mates in large populations, aiding management in many 
species. 

Putative adaptive divergence 

We identified 6.7% of SNPs in our data set as outliers, con- 
sistent with past studies identifying 5-10% of markers as 
candidates for directional selection (Nosil et al. 2009). In 
general, patterns of divergence observed from our outliers 
were similar to patterns obtained using neutral markers. 
BLAST annotation of outlier loci revealed a high frequency 
of transposable elements, similar to the overall data set. 
These transposable elements are quite common in teleost 
fish and are generally assumed to behave as neutral markers 
(Radice et al. 1994) although some evidence suggests that 
they can be adaptively important (Casacuberta and 
Gonzalez 2013). 

Tests for genomic regions under putative selection 
revealed that these regions appeared to be spread randomly 
across the genome with few common 'hot spots' among 
populations. This pattern is consistent with Bourret et al. 
(2013), who found a similar distribution across the Atlantic 
salmon genome. Despite the apparent randomness, three 
regions were differentiated in more than five of ten popula- 
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tion comparisons. These highly divergent regions may rep- 
resent adaptively significant areas of the Chinook salmon 
genome and should be targets of future research. Popula- 
tion comparisons that included the Big Salmon River gen- 
erally displayed the largest number of divergent regions. 
Although these regions likely represent adaptively signifi- 
cant areas of the genome, it is possible that at least a por- 
tion of them resulted from genetic drift as a result of 
isolation during the last glacial maximum (Olsen et al. 
2011). Research aimed at disentangling signatures of drift 
from those of natural selection should therefore focus on 
systems with low neutral divergence across heterogeneous 
environments (Nielsen et al. 2009). 

Management and conservation implications for western 
Alaska 

Returns of Chinook salmon to western Alaska have fallen 
dramatically over the last decade compared to their long- 
term average (ADF&G 2013). This precipitous decline 
has prompted multiple fisheries closures causing extensive 
economic hardship and threatening subsistence catches 
for natives of the western Alaska region. Some of these 
closures stem from the inability of fisheries managers to 
differentiate a late run that is of normal size from a small 
run that is returning at a normal date. One way that 
managers can differentiate these two scenarios is with 
stock composition estimates facilitated by panels of high- 
throughput SNPs. Specifically, stock composition esti- 
mates from mixed-stock fisheries and test fisheries on the 
high seas can be used to monitor the contribution of 
each stock in real time, helping to inform the need for 
fisheries closures and generally improving fisheries man- 
agement (Seeb et al. 2000; Smith et al. 2005; Dann et al. 
2013). Despite this potential utility, tools for genetic 
stock identification in marine waters of western Alaska 
have been severely hampered by lack of genetic diver- 
gence among regions (Templin et al. 2011). Our data 
provide the first evidence that assignment to region of 
origin is feasible in western Alaska despite low levels of 
divergence. Although it is not currently possible to screen 
10 000 loci on thousands of individuals, a subset of our 
RAD loci that show high levels of divergence can be used 
to construct a high-throughput SNP panel to differentiate 
stocks in this region (c.fi, Ackerman et al. 2011). 

This high-throughput SNP panel could also be used to 
investigate the migration and distribution patterns of Chi- 
nook salmon on the high seas (e.g., Murphy et al. 2009; Lar- 
son et al. 2013). Patterns of productivity in the marine 
environment are thought to be a major cause of the fluctua- 
tions in abundance observed in Chinook salmon from wes- 
tern Alaska (Farley et al. 2005). Despite this variability, 
most stock assessment models assume a constant marine 



mortality rate across all stocks. The ability to monitor stock- 
specific abundance on the high seas could provide impor- 
tant information for stock assessment models which is 
currently unavailable. Additionally, stock composition esti- 
mates could be used to monitor the impact of Chinook 
salmon interception in the Bering Sea pollock fishery; this 
fishery has captured as many as 100 000 Chinook salmon in 
a single year (Gisclair 2009). In summary, our results repre- 
sent the first step toward a panel of high-throughput SNPs 
that can be used to conduct genetic stock identification and 
improve stock-specific management in the western Alaska 
region. 

Applicability to other study systems 

Our study demonstrates the utility of genomic data when 
attempting to differentiate closely related populations and 
estimate demographic parameters. The methods we 
employed will be especially applicable in marine species, 
which are often characterized by low genetic differentiation 
and large population sizes (Waples 1998; Nielsen and 
Kenchington 2001). For example, individual-based analyses 
with thousands of markers can provide extremely accurate 
estimates of individual genetic variation. Additionally, this 
method can shed light on patterns of connectivity by iden- 
tifying migrants and admixture within populations. 

Estimates of N e in large marine populations can also be 
improved using approaches similar to ours (e.g., Gruenthal 
et al. 2014). Dense linkage maps have already been devel- 
oped for many marine species including cod (Hubert et al. 
2010), flounder (Castano-Sanchez et al. 2010), and shrimp 
(Du et al. 2010). By combining these linkage maps with 
genomic data, it may be possible to accurately estimate N e 
and NJN in many economically important marine species. 
These estimates can provide important insights into the 
adaptive potential of marine populations and can be used 
to inform management (Hare et al. 2011). 

Summary 

Our results demonstrated fine-scale structure between 
regions in western Alaska. This structure allowed us to 
assign fish back to their region of origin with greater than 
90% accuracy, representing a significant improvement over 
past studies. We also estimated N e for each population 
using a novel method for nonmodel organisms. Estimates 
were generally large and provided some evidence that meta- 
population dynamics influence demography in this region. 
Investigation of loci and genomic regions under putative 
selection found three potential regions of adaptive diver- 
gence. The methods described in our study will be particu- 
larly applicable to marine species or any species where large 
population size and shallow structure are common. 
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